Skip to content

Conversation

@elmiko
Copy link
Contributor

@elmiko elmiko commented Sep 10, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR adds a new lister for ready unschedulable nodes, it also connects that lister to a new parameter in the node info processors Process function. This change enables the autoscaler to use unschedulable, but otherwise ready, nodes as a last resort when creating node templates for scheduling simulation.

Which issue(s) this PR fixes:

Fixes #8380

Special notes for your reviewer:

I'm not sure if this is the best way to solve this problem, but i am proposing this for further discussion and design.

Does this PR introduce a user-facing change?

Node groups where all the nodes are ready but unschedulable will be processed as potential candidates for scaling when simulating cluster scheduling.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/needs-area cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 10, 2025
@k8s-ci-robot k8s-ci-robot added area/cluster-autoscaler size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed do-not-merge/needs-area labels Sep 10, 2025
@elmiko
Copy link
Contributor Author

elmiko commented Sep 10, 2025

i'm working on adding more unit tests for this behavior, but i wanted to share this solution so we could start talking about it.

@elmiko elmiko force-pushed the unschedulable-nodes-fix branch from a0ebb28 to 3270172 Compare October 2, 2025 20:50
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 2, 2025
@elmiko
Copy link
Contributor Author

elmiko commented Oct 2, 2025

i've rewritten this patch to use all nodes as the secondary value instead of using a new list of ready unschedulable nodes.

@elmiko elmiko changed the title WIP update to include unschedulable nodes update node info processors to include unschedulable nodes Oct 2, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 2, 2025
@elmiko
Copy link
Contributor Author

elmiko commented Oct 2, 2025

i need to do a little more testing on this locally, but i think this is fine for review.

// Last resort - unready/unschedulable nodes.
for _, node := range nodes {
// we want to check not only the ready nodes, but also ready unschedulable nodes.
for _, node := range append(nodes, allNodes...) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure that this is appropriate to append these. theoretically the allNodes should already contain nodes. i'm going to test this out using just allNodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

due to filtering that happens in obtainNodeLists, we need to combine both lists of nodes here.

@elmiko elmiko force-pushed the unschedulable-nodes-fix branch from 3270172 to cb2649a Compare October 3, 2025 16:37
@elmiko
Copy link
Contributor Author

elmiko commented Oct 3, 2025

i updated the argument names in the Process function to make the source of the nodes more clear. i also changed the mixed node info processor to not double count the nodes for the unschedulable/unready detection clause.

@elmiko
Copy link
Contributor Author

elmiko commented Oct 3, 2025

it seems like the update to the mixed node processor needs a little more investigation.

@elmiko elmiko force-pushed the unschedulable-nodes-fix branch from cb2649a to fd53c0b Compare October 3, 2025 16:59
@elmiko
Copy link
Contributor Author

elmiko commented Oct 3, 2025

it looks like we need both the readyNodes and allNodes lists due to the filtering that happens in the core.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 7, 2025
@elmiko elmiko force-pushed the unschedulable-nodes-fix branch from fd53c0b to 906a939 Compare October 8, 2025 18:44
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 8, 2025
@elmiko
Copy link
Contributor Author

elmiko commented Oct 8, 2025

rebased

@elmiko
Copy link
Contributor Author

elmiko commented Oct 14, 2025

@jackfrancis @towca any chance at a review here?

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 24, 2025
@elmiko
Copy link
Contributor Author

elmiko commented Oct 24, 2025

refactored to put the unschedulable flag clearing behind a flag. i'm not totally happy with this solution as it feels a little sneaky to add the boolean value to the TaintConfig struct. but, it is cleaner this way.

@elmiko elmiko force-pushed the unschedulable-nodes-fix branch 2 times, most recently from c1f4a88 to 57994b4 Compare October 24, 2025 20:34
newNode.Labels[apiv1.LabelHostname] = newName

if taintConfig != nil && taintConfig.ShouldIgnoreNodeUnschedulable() {
newNode.Spec.Unschedulable = false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: IMO it'd read a bit better if the ifs were nested.

balancingIgnoreLabelsFlag = multiStringFlag("balancing-ignore-label", "Specifies a label to ignore in addition to the basic and cloud-provider set of labels when comparing if two node groups are similar")
balancingLabelsFlag = multiStringFlag("balancing-label", "Specifies a label to use for comparing if two node groups are similar, rather than the built in heuristics. Setting this flag disables all other comparison logic, and cannot be combined with --balancing-ignore-label.")
awsUseStaticInstanceList = flag.Bool("aws-use-static-instance-list", false, "Should CA fetch instance types in runtime or use a static list. AWS only")
ignoreNodeUnschedulable = flag.Bool("ignore-node-unschedulable", false, "Specifies that the CA should ignore a node's .spec.unschedulable field in node templates when considering to scale a node group.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: IMO the flag/option name is pretty vague and you need the description to understand it. On the other hand I can't think of a different name that isn't horribly verbose (something like --assume-template-node-always-schedulable comes to mind) 😅

@jackfrancis WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably prefer something more "positive" like scaleFromUnschedulable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sound fine to me, --scale-from-unschedulable be an acceptable flag then?

@towca
Copy link
Collaborator

towca commented Oct 28, 2025

Thanks a lot for all the work on this PR @elmiko!

refactored to put the unschedulable flag clearing behind a flag. i'm not totally happy with this solution as it feels a little sneaky to add the boolean value to the TaintConfig struct. but, it is cleaner this way.

IMO this is perfectly fine, the unschedulable bit is very similar to a taint in the first place. I'm not a fan of the flag/option name, but I also don't have ideas that are definitely better so I'll leave that up to you/other reviewers.

LGTM, holding for other reviewers.

/lgtm
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 28, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 28, 2025
@towca
Copy link
Collaborator

towca commented Oct 28, 2025

Missed the

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 28, 2025
@elmiko
Copy link
Contributor Author

elmiko commented Oct 28, 2025

thanks @towca , i'm happy to adjust the names to make it easier to understand.

This change introduces a flag which will instruct the CA to ignore a
node's `.spec.unschedulable` field when creating node template for
considering which node group to scale.
@elmiko elmiko force-pushed the unschedulable-nodes-fix branch from 57994b4 to 4c4511b Compare October 28, 2025 21:11
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 28, 2025
@elmiko
Copy link
Contributor Author

elmiko commented Oct 28, 2025

updated to rename the flag, and variables, --scale-from-unschedulable

@jackfrancis
Copy link
Contributor

/label tide-merge-method-squash
/lgtm
/approve
/hold cancel

thanks @elmiko!

@k8s-ci-robot
Copy link
Contributor

@jackfrancis: The label(s) /label tide-merge-method-squash cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor, ci-short, ci-extended, ci-full. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/label tide-merge-method-squash
/lgtm
/approve
/hold cancel

thanks @elmiko!

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 28, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 28, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elmiko, jackfrancis, towca

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jackfrancis
Copy link
Contributor

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 28, 2025
@k8s-ci-robot k8s-ci-robot merged commit 2cd7445 into kubernetes:master Oct 28, 2025
7 checks passed
@elmiko elmiko deleted the unschedulable-nodes-fix branch October 29, 2025 12:58
elmiko added a commit to elmiko/kubernetes-autoscaler that referenced this pull request Oct 29, 2025
…nodes (kubernetes#8520)

* pass allNodes to node info provider Process

This change passes all the nodes to the mixed node info provider
processor that is called from `RunOnce`. The change is to allow
unschedulable and unready nodes to be processed as bad canidates during
the node info template generation.

The Process function has been updated to separate nodes into good and
bad candidates to make the filtering match the original intent.

* add --scale-from-unschedulable flag

This change introduces a flag which will instruct the CA to ignore a
node's `.spec.unschedulable` field when creating node template for
considering which node group to scale.
elmiko added a commit to elmiko/kubernetes-autoscaler that referenced this pull request Oct 29, 2025
…nodes (kubernetes#8520)

* pass allNodes to node info provider Process

This change passes all the nodes to the mixed node info provider
processor that is called from `RunOnce`. The change is to allow
unschedulable and unready nodes to be processed as bad canidates during
the node info template generation.

The Process function has been updated to separate nodes into good and
bad candidates to make the filtering match the original intent.

* add --scale-from-unschedulable flag

This change introduces a flag which will instruct the CA to ignore a
node's `.spec.unschedulable` field when creating node template for
considering which node group to scale.
elmiko added a commit to elmiko/kubernetes-autoscaler that referenced this pull request Oct 29, 2025
…nodes (kubernetes#8520)

* pass allNodes to node info provider Process

This change passes all the nodes to the mixed node info provider
processor that is called from `RunOnce`. The change is to allow
unschedulable and unready nodes to be processed as bad canidates during
the node info template generation.

The Process function has been updated to separate nodes into good and
bad candidates to make the filtering match the original intent.

* add --scale-from-unschedulable flag

This change introduces a flag which will instruct the CA to ignore a
node's `.spec.unschedulable` field when creating node template for
considering which node group to scale.
elmiko added a commit to elmiko/kubernetes-autoscaler that referenced this pull request Oct 29, 2025
…nodes (kubernetes#8520)

* pass allNodes to node info provider Process

This change passes all the nodes to the mixed node info provider
processor that is called from `RunOnce`. The change is to allow
unschedulable and unready nodes to be processed as bad canidates during
the node info template generation.

The Process function has been updated to separate nodes into good and
bad candidates to make the filtering match the original intent.

* add --scale-from-unschedulable flag

This change introduces a flag which will instruct the CA to ignore a
node's `.spec.unschedulable` field when creating node template for
considering which node group to scale.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CA potential for skipped node template info when a node group contains only non-ready nodes

4 participants